I'm writing this 30mins after the announcement of Sora, OpenAI's nextgen video generation model. Just as I was blown away at seeing GPT for the first time, I have the same feeling now. One video caught my attention - it was an animated video of a giant cloud man hovering over the earth angrily sending lightning down onto the earth. It really wasn't the most impressive video of the lot, but it was the prompt in english that struck me:
"A giant, towering cloud in the shape of a man looms over the earth. The cloud man shoots lighting bolts down to the earth."
Why didn't Sora create a video of a pleasant looking smiley cloud man, but instead a giant muscular angry looking person? It's because of the subtle connotation of the prompt - 'towering', 'looming'. Adjectives that convey a sense of formidability.
These verbs are unique to the English language. And I'm not sure other languages share similar words with similar connotations. It made me wonder - what about models trained on other world languages? Will they have to train their own language models as powerful as OpenAI's to be able to keep pace? Do they have the resources to do so, or will they lag behind in technology while English speaking nations happily use their english-trained AI models?
For example, if we imagine a future film industry that relies heavily on prompting AI to create new movies (something I find very plausible), how will the Arab-speaking animator keep pace, if he doesn't learn English? And if he does, how long before he has mastery of connotations to be able to create these subtle animation styles? You don't need to speak English to use After Effects, you do need English to prompt a model.
I see only three options here: either non-English cultures train their own models, new technology is created to mediate between languages, or cultures start learning English as a critical professional skill.
With the pace of technology, I find it hard to see countries keeping up with the American big tech giants to train their own models. Furthermore, is there enough digital data labelled in foreign languages to make the training of such a model even viable?
I think it is plausible that some new AI models is used to mediate between prompts and not lose any of the subtle meaning communicated through language nuances.
I believe that it is most likely, that there will be a greater investment in learning English instead. This has already happened in history - why is the QWERTY keyboard universally accepted all around the world, despite it being designed for languages using the latin alphabet? Even China, a technological superpower had to adapt to the english keyboard to keep pace. The reality is that cultures that found technologies tend to influence surrounding cultures around it.
If this is true, what then happens to culture, and heritage which is so much influenced by language?